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RELIABLE MULTICAST USING MERGED ACKNOWLEDGMENTS 



Technical Field 

The inventioii relates to communication of information and more particularly to multicast operations. 
Background Art 

5 In current computing environments, especially networked environments, a source node on the 

network may wish to supply a plurality of destination nodes with the same information. In such situations, 
some systems provide a multicast capability in which the source node can send mult^le destination nodes the 
same information at the same time. In such multicast operations, any number of multiple targets can receive 
the multicast information. 

10 Refening to Fig. 1 , a multicast operation is illustrated in which an initiator node lo simultaneously 

sends the same information to target nodes To, Ti, and T2. Because the destination or target nodes can receive 
the multicast information simultaneously, the nuilticast operation is time efficient. 

One difficulty with multicasting simultaneous information is that it may be difficult for the initiator 
node who sends the information to deteimiae if the target nodes successfully received the information. Thus, 

15 the operation is unreliable in the sense that the initiators cannot determine if the transmission was successful. 
If the receiving nodes send acknowledgments indicating successful receipt of the multicast information, Uiere 
would be a tendency for the acknowledgments to colKde or otherwise contend for resources of the 
communication medium That is because the targets would likely send die acknowledgments to the initiator 
node at the same time. In a switched synchronous network, sending such acknowledgments could result in 

20 undesirable collisions and possible loss of acknowledgment information. In other systems, the 

acknowledgments may be buffered within the switch as collisions occur, or require retry as some targets would 
be unable to obtain the communication medium to send the acknowledgment In either of those situations, the 
advantage of time efficiency is diminished if acknowledgments take a long time relative to the original 
multicast due to contention for resources of the communication medium connecting ihe sending and receiving 

25 nodes. 

One way to avoid such contentions and/or collisions is to provide the information sequentially as 
shown in Fig. 2, rather than simultaneously, as shown in Fig. 1. In the sequential operation, the initiator node 
lo successively sends the same information at 201, 202 and 203 to the target nodes To, Ti, and T2. The target 
nodes respond sequentially with acknowledgments at 204, 205 and 206. Because the acknowledgments are 

30 sequential, they do not conq)ete with each other for communication medium resources. Thus, the operation is 
reliable in the sense that the initiator can determine if the transmission was successful. However, the 
sequential nature of the operation for both the transmission of the information and the transmission of the 
acknowledgments eliminates any efficiency which could be gained from a true multicast operation in which 
multicast information is sent simultaneously. Thus, there is a relatively long latency for completion of the 

35 entire operation. 
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For certain time-mtical multicast operations, it is in?)ortant to niiniinize latency. For example, for 
tme-critical multicast operations such as synchronization of clocks in a network, coherency protocols, and 
cations in databases/transaction systems such as commit at abort, nnnimijdng latency would be 
advantageous. 

Accordingly, it would be desirable to provide a multicast operation that is bofli efficient and reliable. 
DISCLOSU RE OP THF. INVENTIOIV 

Accordingly, m cme embodiment, flie invention provides a method of multicasting that 
simultaneously sends multicast information from a source to a ptarality of targets. The targets respond to the 
multicast information by sending acknowledgments that indicate receipt of the multicast information. The 
acknowledgments are merged into a merged acknowledgment which is then supplied to the source. The 
source can determine from the meiged acknowledgment whether the targets successfully received the multicast 
information. 

In an embodiment, the multicast information and acknowledgments are transmitted across a network 
switch and the switch merges fte acknowledgments before forwarding the merged acknowledgment to the 
source. 

In another embodiment, a method is provided for transmitting information between an initiator node 
in a network and a phiraHty of target nodes. The method mcludes transmitting infonnation from the initiator 
node to the target nodes simultaneously; simultaneously sendmg acknowledgments from the multiple nodes 
mdicating receipt of the information; combining the acknowledgments and sending the combined 
acknowledgments to the hiitiator node to mdicate receipt of the multicast mformation by the target nodes. 

In ano&er embodhnent. the mvention provides a data network that includes a sendmg node and a 
plurality of receiving nodes coupled to simultaneously receive information from the sending node during a 
multicast operation and coupled to respect&Uy provide acknowledgments of successful receipt of the multicast 
information. A switdung medium supplies the multicast infonnation to the respective receiving nodes 
simultaneously. Logic in the switching medhim receives and combines the respective acknowledgments to 
provide a combined acknowledgment to the sendmg node. The combined acknowledgment may be a logical 
combination of the mdividual acknowledgments. 

BRIEF DESCRIPTION OF THE DRAWTNfIS 

The present invention may be better understood, and its numerous objects, features, and advantages 
made apparent to those skilled m the art by referencing the acconq>anying drawings. 

Fig. 1 illustrates operation of an unreliable multicast operation m which no acknowledgments are 
provided by flie targets. 



Fig. 2 illustrates a sequential operatiorL 
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Fig. 3 illustrates operation of a reliable simultaneous multicast operation. 

Fig. 4 illustrates an embodiment in which a multi-port switch is used for a multicast operation. 

Fig. 5 illustrates an embodiment in which a multi-port switch is used to merge the acknowledgments, 
to indicate successful completion of the multicast operation. 

5 Fig. 6 illustrates an embodiment in which a multi-port switch is used to merge the acknowledgments, 

to indicate a &iled multicast operation. 

Fig. 7 illustrates how single-bits can be concatenated into a vector. 

Fig. 8 illustrates the transmission portion of a merged single-bit acknowledgment approach. 
Fig. 9 illustrates the acknowledgment portion of the merged single-bit acknowledge operation. 
10 The use of the same reference symbols in different drawings indicates similar or identical items. 

DESCREPTIQN OF THE PREFERRED EMBODIMENTfS^ 

Referring to Fig. 3, operation of a reliable multicast operation is illustrated. Assume the system 
includes multiple nodes including the illustrated initiator node Jq and three target nodes To, Ti and T2. The 
initiator node lo sends information (data) to tiie three targets To, Ti and T2 simultaneously, i.e., the initiator 
1 5 node Iq multicasts tiie information to the three targets. Each target, assuming successful receipt, sends back an 
acknowledgment (ack) to the initiator node lo. As described further herein, in order for the initiator node lo to 
receive the simultaneously sent acknowledgments, the acknowledgments are merged and then provided to the 
initiator node. The merger operation is described further herein. 

Referring to Fig. 4, the first part of a reliable multicast operation according to an embodiment of the 
20 invention is illustrated. In the first part of the multicast operation, the multicast information in the form of 
packet(s) P, is sent from initiator node Nl through input port 403 to target nodes N5, N6 and N7 across 
multiport switch 401 . Note that packet(s) P may be one or more packets comprising one or more bytes of data 
and/or control informatioiL 

Referring to Fig. S, the acknowledge phase of die multicast operation is illustrated. Nodes N5, N6 
25 and N7, which received tiie multicast packet(s) P, respectively send acknowledge packets (ack) 501, 503 and 
505 to node Nl, which sent the multicast packet(s) P. Note that the exeinplary acknowledge packets are 
shown in sirnplified form without information such as address, type of operation or other control information 
that would typically be associated with such a packet. FiuUier note that a host typically contains both an 
initiator node and a target node and that the initiator and target share the input and output port of the switch. 
30 For exanple, Nl and N5 belong to the same host and send packets to input port 403 and receive packets from 
output port 405. 
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The exemplary multiport switch 401 includes four possible inputs and four possible ou^uts. Thus, in 
the embodiment ilhistrated in Fig. 5, flie acknowledge packet (ack) from each multicast target node includes a 
vector of four bits, one bit corresponding to one of four possible output ports or targets on flie switch. As 
ilhistrated in Fig. 5, the leftmost bit ia the vector corresponds to node N5. flie next bit to node N6. etc. Thus, 
when node N5 acknowledges the multicast, it sets the leftmost bit in its acknowledge vector 501 to indicate 
fliat N5 successfully received the multicast packet(s) P. Node N6 sets the bit second from flie left in its 
acknowledge vector 503 to indicate that it successfully received the multicast packet(s) P. Node N7 sets the 
bit (hird from the left in its acknowledge vector 505. 

Ou^t prat 507 merges the acknowledge packets received respectively from nodes N4, N5 and N6. 
As iUustrated in Fig. 5. fliat can be accon^jlished by ORing together flie acknowledge packets in OR logic m 
output port 507. When ORed together flie merged acknowtedgmentpackct 509 is generated and supplied to 
node Nl . Node Nl can detemiine from the three bits set in merged acknowledge packet 509 fliat nodes N5, 
N6 and N7 successfliUy received the multicast packet(s) P. Thus, multiport switch 401 can provide a reliable 
and efBdent multicast operation, since the acknowledge packets can be sent over tiie switch efficienfly. That 
is made possible by flie merging inq>lemented in the output port 

Refecting to Fig. 6, anoflier operation of the multicast acknowledge is illusti-ated when some of the 
target nodes of flie multicast operation M to correcfly receive flie multicast packet P. That may be tiie result 
of, e.g., uncoiiectable errors detected by flie receiving node. As can be seen, only node N6 conecfly recaved 
flie multicast packet(s) P as indicated by flie "0100" in its acknowledge packet. When flie acknowledge 
packets from N5, N6 and N7 are ORed togetiier, merged acknowledge packet 601 results which indicates fliat 
errors were detected by two nodes (N5 and N7). Using fliat infoimation, flie node initiating flie multicast node 
can take appropriate action in response to flie detected errors, such as lesending flie multicast packet P to tiie 
nodes that failed. 

As would be known in flie art, fliere are many oflier ways to encode flie sources' of flie 
acknowledgments and to merge flie acknowledge packets. For example, while flie OR operation is possible, an 
embodiment could simply select flie relevant bit from each output port acknowledge vector for inclusion in a 
merged acknowledge vector. Referring to Fig. 7. an example is shown in which single bits from each of flie 
targets is merged into a vector. More particularly, each bit 701, 702 and 703 is concatenated to form vector 
704, which is presented to fiie source to indicate which targets successfiilly received flie multicast data. 
Alternatively, flie switch could provide a count of flie number of acknowledging multicast targets fliat 
indicated successful receipt, aMiough fliat hnplementation would likely require more logic. 

In a typical system, flie input ports (or flie control logic associated wifli flie input ports) are aware of 
flie multicast operation from mfisrmation contamed m a packet header. From tiiat mfoimation, flie control 
logic knows to comiect flie mput port to flie appropriate output ports. There are various approaches fliat could 
be used to alert tiie output port to merge flie acknowledgments received by tiie mput ports from flie various 
targets. For example, an acknowledge packet may be marked as a multicast acknowledgment. Assuming fliat 
flie packets to be merged arrive at flie input ports simultaneously, flie output port merges fliose packets fliat are 
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destined for it and appropriately marked. Alternatively, e.g., in a pipelined network, the switch can remember 
that it scheduled a multicast data transfer and merge the acknowledge packets at a particular pipeline stage in 
fhe future. It is also possible for acknowledge packets destined for the same port to merge packets whenever 
there exists multiple acknowledge packets for the same ou4>ut port. That assumes that acknowledge packets to 
5 be merged arrive simultaneously. Thus, a multicast acknowledge would be presumed in such situations. Note 
that the switch settings for forwarding the acknowledgments can be inferred from settings for forwarding the 
multicast data. 

It is also possible to merge acknowledge packets into an acknowledge packet containing a single bit 
rather tiian a bit vector, which is then forwarded on to the initiator node. Atomic operations are one 
10 application for a merged single bit acknowledge. Refening to Figures 8 and 9, operation of a merged single 
bit acknowledge is illustrated. In Figure 8 a multicast operation sends data froni initiator node Nl to target 
nodes N5, N6 and N7. A forwarding mask 801 is generated that indicates which of tiie possible targets 
received the multicast data. That forwarding mask is utilized in merging the acknowledgments into a single bit 
as illustrated in Figure 9. 

15 Referring to Figure 9, node N5 sends back acknowledgment 901 , node N6 sends back 

acknowledgment 902, and node N7 sends back acknowledgment 903 as shown. Note that acknowledgment 
902 indicates that node N6 failed to properly receive the multicast data. The merging is accon:q>Ushed as 
follows. The individual acknowledgments are inverted and logically combined in AND gates 904 with the 
forwarding noask 801. The output of AND gates 904 are then logically combined in NOR gate 905 to provide 

20 the single bit acknowledgment 906 to the initiating node Nl . In the exan^le illustrated in Figure 9, the zero 

acknowledgment 902 from node N6 caiises the single bit acknowledgment to be a zero indicating that a failure 
occurred. Note that while the acknowledgments 901, 902, and 903 from nodes N5, N6 and N7 are shown as 
single bits, as one of ordinary skill in the art would imderstand the acknowledgments can be in the various 
forms, e.g., an acknowledge packet indicating successful receipt or an acknowledge packet indicating 

25 unsuccessful receipt (NACK). Further, the acknowledgment 906 can also be in the form of an acknowledge 

packet indicating successfiil transmission or no acknowledge (nack) packet indicating transmission failure. An 
important aspect of this embodiment is that the overall success or failure of die multicast is encoded in a single 
bit (or bits) without providing information regarding individual multicast success or failure of the targets. 

Other acknowledgment variations are also possible. For example, fine-grained acknowledgments 
30 may be used in which separate bits are provided, e.g., for CRC error, permission error, buffer overflow, etc. 

Thus, an exemplary system combines the individual bits, e.g., for CRC error, for all the acknowledging targets. 
Again, individual bits can be merged into either a bit vector or a single bit In the later case, one bit of the 
merged acknowledgments represent the CRC errors from all ttie targets, one bit represents all the permission 
errors etc. The initiator node would know whether or not all targets successfully received the packet with or 
35 without a CRC error, or permission error, etc. 
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Thus. an efficient and reliable multicast operation has been described. While described in relation to 
a multiport switch, any switchmg medium that can effectively merge the multicast acknowledges can 
effectively utihze the invention described herem. 

The embodiments described above are presented as examples and are subject to other variations in 
stmcture and implementation within the capabilities of one reasonably skilled in the art The details provided 
above should be interpreted as iltastrative and not as limiting. Variations and modifications of the 
embodiments disclosed herein, may be made based on the description set forth herein, without departing fiom 
the scope and spirit of the invention as set forfli m the followipg claims. 
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WHAT IS CLAIMED IS: 

1 . A method of miilticastmg, conq)rismg: 

sending multicast information &om a source to a plurality of targets; 

sending respective acknowledgments from each of the targets, indicating receipt of the multicast 
information; 

5 merging the respective acknowledgments into a merged acknowledgment; and 

supplying the merged acknowledgment to the source. 

2. The method as recited in claim 1 wh^ein the multicast information is sent across a switch to 
a plurality of targets. 



3. The method as recited in claim 2 wherein the respective acknowledgments are sent from the 
10 respective targets to the switch. 

4. The method as recited in claim 3 wherein the switch merges the respective acknowledgments 
and forwards the mei^ed acknowledgment to the source. 

5. The method as recited in claim 4 wherein the acknowledgments are supplied in an 
acknowledgment packet encoding an identity of the acknowledging target. 

15 6. The method as recited in claim 3 wherein the switch is a synchronous switch and all 

acknowledgments are received by the switch at the same time. 

7. The method as recited in claim 3 wherein the switch is a network switch coupling a plurality 
of sources and a plurality of targets in a network. 

8. The method as recited in claim 1 wherein die merged acknowledgment is formed by 
20 logically combining the respective acknowledgments. 

9. The method as recited in claim 1 wherem the merged acknowledgment encodes the 
respective acknowledges to indicate to the source which targets successfully received the multicast 
information. 

10. The method as recited in claim 1 wherein the merged acknowledgment indicates whether all 
25 of the targets successfully received the multicast information, the merged acknowledgment not identifying 

which of the targets successfully received or failed to successfully receive the multicast information. 
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11. The method as redted in claim 10 wh«ein the merged acknowledgment includ^^ 
indicating whether all of &e targets successfclly received the multicast information. 

12. A networked system con^nising: 
a sending node; 

a plurality of receiving nodes coupled to simultaneously receive multicast information sent from the 
sending node during a multicast operation and coupled to provide acknowledgments 
indicating wheflier die multicast information was successfully received; and 

a switching medium coupled to supply die multicast information to the respective receiving nodes 

shnultaneously and to receive and combine flie respective acknowledgments into a combined 
acknowledgment supplied to the sending node. 

13. The networked system as recited in claim 12 wherein the networked system includes a 
switched data network and the switching medium is a network switch. 

14. The networted system as recited in claim 12 wherein each acknowledgment con^rises a 
plurality of bits, each bit corresponding to a different node, one bit being set to indicate that a node 
corresponding to the one bit successfully received Hit multicast information. 

15. The networked system as recited in claim 14 wherein the combined acknowledgment 
inchides a plurality of bits correspondmg to multicast targets, each bit of the combined acknowledgment that is 
set corresponding to a node fliat successfuUy received die multicast information. 

16. The networked system as recited in claim 12 wherein each acknowledgment comprises a 
plurality of bits, each bit coirespondmg to one of a pluraUty of types of enors. 

17. Thenetworkedsystemasrecitedindaim ISwhereincorrespondingbitsfiomrespective 
ones of the acknowledgments are combined in the combined acknowledgment, a bit being set to a first 
predetemnned value in the combined acknowledgment to indicate that one or more of the targets had a 
particular one of die errors and the bit being set to a second value to indicate that none of the receiving nodes 
had the particular one of the errors. 

18. The networked system as recited m claim 12 wherein tte acknowledgments from die 
pluraUty of target nodes are provided to die switohing medimn at a fixed time relative to the sending of the 
multicast informatioii. 

19. The networiced system as recited in claim 18 wheiem die combmed acknowledgment is 
provided to die source node at a fixed time relative to die sending of the multicast information. 
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20. The networked system as recited in claim 1 2 wherein Ihe networked system is pipelined. 



2 1 . The networked system as recited in claim 1 2 wherein the switching medium combines the 
acknowledgments in response to information in each acknowledgmeiit packet that indicates a multicast 
acknowledge is being sent 

5 22. The networked system as recited in claim 12 wherein die switching medium combines the 

acknowledgments into the combined acknowledgment if the acknowledgments arrive at the same time in the 
switching medium and are destined for a same source. 

23. The networked system as recited in claim 12 wherein the switching medium combines the 
acknowledgments in response to having scheduled a multicast data transfer. 

10 24. The networked system as recited in claim 12 wherein the networked system is operable to 

reserve switch paths for forwarding the acknowledgments based on switch settings used for forwarding the 
multicast information. 

25. The networked system as recited in claim 12 wherein the networked system includes a 
plurality of hosts, each of the hosts including both a sending node and a receiving node coupled to the 

IS switching medium. 

26. An apparatus for transmitting information between an initiator node and a plurality of target 
nodes, comprising: 

means for multicasting information to a plurality of the target nodes from the initiator node; and 
means for combining received acknowledgments indicating whether the multicast information was 
20 successfully received, into a combined acknowledgment and returning the combined 

acknowledgment to the initiator node. 
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